Audio context recognition in variable mobile environments from short segments using speaker and language recognizers
نویسندگان
چکیده
The problem of context recognition from mobile audio data is considered. We consider ten different audio contexts (such as car, bus, office and outdoors) prevalent in daily life situations. We choose mel-frequency cepstral coefficient (MFCC) parametrization and present an extensive comparison of six different classifiers: knearest neighbor (kNN), vector quantization (VQ), Gaussian mixture model trained with both maximum likelihood (GMM-ML) and maximum mutual information (GMM-MMI) criteria, GMM supervector support vector machine (GMM-SVM) and, finally, SVM with generalized linear discriminant sequence (GLDS-SVM). After all parameter optimizations, GMM-MMI and and VQ classifiers perform the best with 52.01 %, and 50.34 % context identification rates, respectively, using 3-second data records. Our analysis reveals further that none of the six classifiers is superior to each other when class-, useror phone-specific accuracies are considered.
منابع مشابه
Speaking rate normalization with lattice-based context-dependent phoneme duration modeling for personalized speech recognizers on mobile devices
Voice access of cloud applications including social networks using mobile devices becomes attractive today. And personalized speech recognizers over mobile devices become feasible because most mobile devices have only a single user. Speaking rate variation is known to be an important source of performance degradation for spontaneous speech recognition. Speaking rate is speaker dependent, it cha...
متن کاملModeling nuisance variabilities with factor analysis for GMM-based audio pattern classification
Audio pattern classification represents a particular statistical classification task and includes, for example, speaker recognition, language recognition, emotion recognition, speech recognition and, recently, video genre classification. The feature being used in all these tasks is generally based on a short-term cepstral representation. The cepstral vectors contain at the same time useful info...
متن کاملComparing the Impact of Audio-Visual Input Enhancement on Collocation Learning in Traditional and Mobile Learning Contexts
: This study investigated the impact of audio-visual input enhancement teaching techniques on improving English as Foreign Language (EFL) learnersˈ collocation learning as well as their accuracy concerning collocation use in narrative writing. In addition, it compared the impact and efficiency of audio-visual input enhancement in two learning contexts, namely traditional and mo...
متن کاملMultimedia Information Access Using Multiple Speaker Classifiers
There have been several new systems for multimedia information access reported in recent years. The system presented here shares many of their aspects, but it differs in a significant way from them; it extends the realm of multimedia access to include speaker-based information. We have already prototyped and reported such a system elsewhere whose main features include SVAPI-based speaker recogn...
متن کاملMulti-modal user authentication from video for mobile or variable-environment applications
In this study, we apply a combination of face and speaker identification techniques to the task of multi-modal (i.e., multi-biometric) user authentication for mobile or variableenvironment applications. Audio-visual data was collected using a web camera connected to a laptop computer in three different environments: a quiet indoor office, a busy indoor cafe, and near a noisy outdoor street inte...
متن کامل